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Abstract. Quantum computers can in principle simulate quantum physics 
exponentially faster than their classical counterparts, but some technical hurdles 
remain. Here we consider methods to make proposed chemical simulation 
algorithms computationally fast on fault-tolerant quantum computers in the 
circuit model. Fault tolerance constrains the choice of available gates, so that 
arbitrary gates required for a simulation algorithm must be constructed from 
sequences of fundamental operations. We examine techniques for constructing 
arbitrary gates which perform substantially faster than circuits based on the 
conventional Solovay-Kitacv algorithm [CM. Dawson and M.A. Nielsen, Quantum 
Inf. Comput., 6:81, 2006]. For a given approximation error e, arbitrary single- 
qubit gates can be produced fault-tolerantly and using a limited set of gates in time 
which is O(loge) or O(logloge); with sufficient parallel preparation of ancillas, 
constant average depth is possible using a method we call programmable ancilla 
rotations. Moreover, we construct and analyze efficient implementations of first- 
and second-quantized simulation algorithms using the fault-tolerant arbitrary 
gates and other techniques, such as implementing various subroutines in constant 
time. A specific example we analyze is the ground-state energy calculation for 
Lithium hydride. 



PACS numbers: 03.67.Ac, 03.67.Lx, 31.15.A- 
1. Introduction 

Simulating quantum physics is arguably one of the most important applications of 
a quantum computer — a problem whose solution is both intractable for classical 
computers and valuable to science [l]. The objective of quantum simulation 
is to model natural physical systems with Hamiltonians that permit a compact 
representation [2^3] . In this investigation, we narrow our focus to quantum chemistry 
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problems such as calculating the eigenvalues of a molecular Hamiltonian [4j|7j . We aim 
to demonstrate constructively how quantum computers can simulate chemistry with 
an efficient use of resources. By doing so, we indicate how close the field of quantum 
information processing is to solving novel problems for less computational cost than 
a classical computer. 

Quantum chemistry and band structure calculations account for up to 30% 
of the computation time used at supercomputer centers [8j. The most-employed 
methods include density functional theory and polynomially-tractable approximate 
quantum chemistry methods [9]. Despite the success of these methods, for example, 
in simulating the dynamics of a small protein from first principles 10 or in predicting 
novel materials 
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they are still approximate, and much work is carried out in 
developing more accurate methods. Quantum simulators offer a fresh approach 
to quantum chemistry 12 as they are predicted to allow for the exact simulation 
(within a basis) of a chemical system in polynomial time. A quantum computer of 
a sufficient size, say 128 logical quantum bits [4 13 , would already outperform the 
best classical computers for exact chemical simulation. This would open the door to 



high-quality ah initio data for parameterizing force fields for molecular dynamics 14 



or understanding complex chemical mechanisms such as soot formation 15 
a number of different chemical species must be compared 



where 



This tends to suggest 
that computational chemistry would be one of the first novel applications of universal 
quantum computers. 

The motivation behind our study is that in order for computational physics 
on quantum computers to be useful as a scientific tool, it must have an efficient 
implementation. Often general algorithmic complexity such as "polynomial time" is 
taken as a by-word for efficient, but we go deeper to show the substantial performance 
disparities between different polynomial-time algorithms, revealing which ones are 
significantly more efficient in space and time resources than their peers. By introducing 
algorithmic improvements and making quantitative analysis of the resource costs, we 
show that simulating quantum chemistry is feasible in a practical execution time, 
such as simulating the ground state energy of Lithium hydride (LiH) in ~ 5.6 hours 
on a hypothetical fault-tolerant quantum computer with an execution time per error- 
corrected gate of 1 ms. Additionally, from an information theory perspective, it is 
interesting to see what quantum computational complexity is required to simulate 



physically-relevant Hamiltonians in general 16 



Several possible simulators have been proposed and studied [12||17f|20| , but we 
focus on fault-tolerant circuit-model quantum simulation in this investigation [2j[4j[n 



13 21 24 . The reasons for these constraints are straightforward: quantum computers 
will probably be sensitive to noise and other hardware errors, thus requiring fault 
tolerance 
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and fault-tolerant quantum computing has been most successfully 
applied in the circuit-model. Fault tolerance requires an overhead of additional work 
for the quantum computer; error correcting codes and the mechanisms they use 
to correct errors have been studied previously 25 - 27 . We focus here on another 



matter critical to simulation algorithms, which is making arbitrary fault-tolerant 
gates. Arbitrary quantum operations, such as a single-qubit rotation of arbitrary angle 
around the Z-axis on the Bloch sphere, are typically constructed using a sequence of 
primitive error-corrected gates 26 ( 28 ( 29 . Quantum simulation depends sensitively on 
the execution time of arbitrary gates of this form, so one of the core contributions of 
this paper is to demonstrate efficient constructions for such gates, which would allow 
simulation of more complex systems under a fixed-resource constraint. 



Simulating chemistry efficiently on fault-tolerant quantum computers 



Simulated Evolution- 



It*- 1 ') 
| t (*-2)) 

|t<°>) 



State Preparation: 
Initialize IV'o) 





Readout 




— s 


~ QFT 





W(2- T - 2 d't) 



Figure 1. Schematic of a digital quantum simulation algorithm for energy 
eigenvalue estimation |2|4| . The three main steps are state preparation, simulated 
evolution, and readout; this investigation focuses on the middle process. After 
preparing an initial state |i/>o), the system is evolved in simulated time by solving 
the time-dependent Schrodinger equation. Note the system propagators U(2 x 8t) 
are controlled by qubits in a register representing simulated time. A quantum 
Fourier transform (QFT) on the time register provides an estimate of an energy 
eigenvalue. The accuracy of the simulation depends on suppressing errors in both 
state preparation and simulated-time evolution, which is why fault tolerance is 
an important consideration for quantum simulation algorithms. 



A digital quantum simulation algorithm consists of three primary steps (figure [T]): 
state preparation, simulated time evolution, and measurement readout. This paper 
focuses on the second step, evolving the system in simulated time, because this 
represents the core of the algorithm. Simulation of time evolution on a quantum 
computer is a sequence of quantum gates which closely approximates the evolution 
propagator U(t; t+St) — Texp Jt +5t H(r)dr^ of a desired Hamiltonian H, where 
T is the usual time-ordering operator. In the case of a time-independent Hamiltonian, 
we have U{5t) = exp (—jrUSt), as in figure 111 The increment St is a single time step 
of simulation, and a simulation algorithm often requires many time steps, depending 
on the desired result {e.g. energy eigenvalue). State preparation and measurement 
readout are necessary steps which are not discussed here, but details can be found in 



references 3, 26, 30 33 . 

The quantum simulation problem we analyze is the ground-state energy 
calculation of LiH from first principles. This was called the "chemist's workbench" and 
is an appropriate continuation of quantum computational applications of chemistry 
going beyond molecular Hydrogen 7,22,34,35 . For some of the selected methods, 
the quantum circuit is compact enough to be tractable for classical computation, so 
our chosen problem would not demonstrate the superiority of quantum computation 
by itself. Still, this example is useful for two reasons. First, the LiH simulation 
preserves the features of more complicated chemical simulations while permitting 
a simple analysis that illustrates the improved methods we propose. Second, with 
quantum computers still in early stages of development, a compact problem such 
as LiH would be a convenient choice for experimental demonstrations of quantum 
simulation in the near-term. 
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This paper provides constructive methods for simulating quantum chemistry 
efficiently using fault-tolerant quantum circuits. Section [2] describes how to 
construct quantum circuits for arbitrary phase rotations, which are essential to 
simulation. Section [3] develops a fault-tolerant simulation algorithm in second- 
quantized representation using phase rotations from the prior section; analysis of the 
computing resources required follows. Section [4] demonstrates how to construct an 
efficient chemistry simulation in first-quantized form, and total quantum resources 
are analyzed. Section [5] outlines how to determine the optimal simulation parameters 
for a given set of engineering constraints and performance objectives. The paper 
concludes by discussing the prospects for fault-tolerant quantum computers to solve 
novel simulation problems. 



2. Fault-tolerant phase rotations 



The algorithms which simulate chemistry on a circuit-model quantum computer 
require many phase rotations, accurate to high-precision. A single-qubit rotation 
gate in general form is 



Rz(4>) 



1 

e** 



(1) 



where <f> is arbitrary and o~z is the Pauli spin operator; in general, phase rotations 
are represented by diagonal unitary matrices, as shown on the RHS of Eqn. ([!]). 
Additionally, any arbitrary single-qubit gate can be produced using three distinct 
phase rotations and two Hadamard gates 26 . Making a quantum computer 



fault-tolerant constrains the available operations to a finite set of fundamental 
gates, so the arbitrary rotations needed to simulate Hamiltonian evolution must be 
constructed from a circuit of these fundamental gates. Phase rotations are needed 
at every time step of simulation, so the performance of the simulation algorithm 
depends on the computational complexity of these arbitrary gate circuits. In this 
section, we discuss three different approaches for implementing arbitrary phase gates 



efficiently: phase kickback 36-38 , which uses multi-qubit gates acting on an ancilla 



register; gate approximation sequences, such as those generated by the Solovay-Kitaev 
algorithm [26j|28] or by Fowler's algorithm [29], which are sequences of single-qubit 
gates; and programmable ancilla rotations (PARs), which compute ancillas in advance 
using one of the above methods to achieve very low circuit depth in the algorithm. 



2.1. Phase kickback 



Phase kickback 36 37 , also known as the Kitaev-Shen-Vyalyi algorithm [38], is an 
ancilla-based scheme that uses an addition circuit to impart a phase to a quantum 
register. Phase kickback relies on a resource state [7^) which can be defined by the 



inverse quantum Fourier transform (QFT) [26 39 40 
v«\ = c/ QFT t \k) = 



1 



\r 



JV-l 



2-Kiky/N 



\y) 



(2) 



The register \k) contains n qubits prepared in the binary representation of k, an odd 
integer. The state [7^) is a uniform- weighted superposition state containing the 
ring of integers from to N ~ 1, where N = 2", and each computational basis state 
has a relative phase proportional to the equivalent binary value of that basis state. 
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Figure 2. Controlled addition of the quantity u determined by Eqn. Q is 
approximately equivalent to an arbitrary phase rotation Rz (0), but the former 
uses only fault-tolerant gate primitives and ancillas. The operation © denotes 
unitary addition modulo 2", where n is the number of qubits in the |7^ fc '^) register; 
for illustration, n = 3 in the circuits above. 



This ancilla register must be produced fault-tolerantly. Ref. [38] provides a method 
to prepare (7^) using phase estimation such that k is a random odd integer; hence 
our analysis does not assume a value for k. If necessary, |Appcndix A| provides a 
technique to convert any [7^) into |7^)- The circuit complexity for creating [7^) 
is small, requiring perhaps a few thousand gates, so the cost of this initialization step 
is negligible compared to quantum algorithms we analyze later. 

One could also view the \"f^) state as a discretely-sampled plane wave with 
wavenumber k. Consider then that (7^) is an eigenstate of the unitary operation 
£/®„ \m) = \m + u (mod N)) for modular addition, so that 

N-l 

£/ ffiu | 7 W ) = -j= E e^ lk{u ~ v)IN \y) = e Mku ' N \^), (3) 

^ N y =0 

where © denotes addition modulo N and u is an integer. Moreover, the eigenvalue 
of modular addition on |7^ fe ') is a phase factor proportional to the number u added. 
Note that the addition operation Uq u is readily implemented with a fault-tolerant 



quantum circuit 41-45 . To determine the value of u in the addition circuit which 



approximates a phase rotation Rz((f>), one solves the modular equation 



ku 



X 

2tt 



(mod AO, (4) 



which always has a solution since k is odd and A^ is a power of 2. The operation \_x] 
denotes rounding any real x to the nearest integer; any arbitrary rule for half-integer 
values suffices here. By proper selection of u, one can approximate any phase rotation 
to within a precision of \A(f>\ < ^stt radians, where Acj) = \d> — jj-ku (mod 2tt)] . We 
can now understand how the method received its name: since |7^ fc -*) is an eigenstate 
of addition, when an integer u is added (using an addition circuit) to this register, a 
phase is "kicked back." This method is quite versatile, as several different types of 
phase gates are developed using phase kickback in this work. 

Single-qubit phase rotations using phase kickback are constructed with a 
controlled addition circuit, as shown in figure [2] Intuitively, a phase is kicked back to 
the control qubit if it is in the 1 1) state, which is equivalent to the phase rotation in 
Eqn. 0. The accuracy of the phase gate and the quantum resources required depend 
on the number of bits in the ancilla state (7^). After solving Eqn. (|4b, the integer u 



is added to |7*-' c '') using a quantum adder controlled by the qubit which is the target 
of the phase rotation. There are various implementations of quantum adder circuits 
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which have tradeoffs in performance between circuit depth and circuit size 41-45 . 
Since |7^) is not altered by phase kickback, the number of such registers required 
for a quantum algorithm is equal to the maximum number of phase rotations which 
are computed in parallel at any point in the algorithm. 

2.2. Gate approximation sequences 

A gate approximation sequence uses a stream of fault-tolerant single-qubit gates to 
approximate an arbitrary phase rotation, such as that in Eqn. (JlJ. For context, a 
common set of fault-tolerant gates is listed in Table [T] below. Such sequences must be 
calculated using a classical algorithm, and at least two options exist. The Solovay- 



Kitaev algorithm 26 28 is perhaps the best known method for generating arbitrary 
quantum operations, so it will serve as a benchmark in our analysis. A subsequently- 
derived alternative, Fowler's algorithm [29], offers shorter gate sequences for a 
given approximation accuracy, with some notable drawbacks in classical algorithmic 
complexity. 

The efficiency of a gate approximation sequence is determined by the accuracy 
of approximation (i.e. how close the composite sequence is to the desired gate) as a 
function of resource costs. Both the Solovay-Kitaev and Fowler algorithms produce 
better approximations if one can afford more quantum gates; however, quantum 
resources are expensive, so we must implement finite-length sequences which produce 
a sufficiently good approximation. We adopt the distance measure in Ref. 29 to 
determine approximation accuracy: 



iietd{u , v) = r ^Mp21, (5) 

where d is the dimensionality of U and V (e.g. d = 2 for a single-qubit rotation). At 
the end of this section, we provide a quantitative analysis of resource costs to produce 
phase rotations. What is sufficient for the moment is to know that, if we denote 
the approximation error as f = dist2(f, U applOK ), the corresponding approximating 
sequence t/ approx has asymptotic length O (poly (log e)), a result known as the Solovay- 



Kitaev theorem 26 . 



2.3. Programmable ancilla rotation 

We introduce a third method for producing phase rotations, the programmable 
ancilla rotation (PAR), which pre-computes ancillas before they are needed. Shifting 
the computing effort to a different point in the quantum circuit (assuming parallel 
computation) allows this method to achieve constant average depth in the algorithm 
for any desired accuracy of rotation, which can be as small as 4 quantum gates. 
The pre-calculated ancillas still require quantum circuits of similar complexity to the 
previously discussed methods, so this approach is best-suited to a quantum computer 
with many excess qubits for parallel computing. 

The PAR is based on a simple circuit which uses a single-qubit ancilla to make 
a phase rotation, which is a "teleportation gate" 46 47 , as shown in figure [3j This 



circuit is probabilistic, so there is a 50% probability of enacting Rz(—<f>) instead of 
Rz(4>)', in such an event, we attempt the circuit again with angle 20, then 4</> if 
necessary, etc. This proceeds until the first observation of a positive angle rotation, 
in which case we have enacted a rotation ^totai = 2 m <f> — J^=i 2 x <fi = (p. 
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Figure 3. Probabilistic rotation using an ancilla qubit. The measurement is 
in the computational (Z) basis. The circuit enacts either Rz(<j)) or Rzi—'t') with 
equal probability. The X gate is classically conditioned on the measurement result. 
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Figure 4. Programmable ancilla rotation (PAR) circuit. The bulk of the 
computing effort is shifted to an earlier part of the circuit, when the ancillas 
are produced. The programmed ancillas are used in multiple rounds of the circuit 
in figure [3] each of which succeeds with 50% probability. The cascading circuit 
above terminates after the first success, as denoted by the "?" decision gates. 
The average number of rounds required is 2, so by pre-computing the ancillas, 
this method contributes very few additional gates to an algorithm's circuit depth. 



The circuit for the PAR is shown in figure [ij The programmed ancillas j^ 1 -*) = 
(|0> |1)), \ujW) = ^ (|0) + e i ^ |1)), etc. are pre-computed using one of the 



methods above for a phase rotation. A very similar method was shown in Ref. 48 
but we generalize here from cj) = ^ to arbitrary rotation angles. In practice, phase 
kickback may be preferable for producing the pre-computed ancillas since reusing the 
same \ ancilla does not introduce additional errors into the circuit. The cascading 
series of probabilistic rotations continues until the desired rotation is produced or the 
programmed ancillas are exhausted. For practical reasons, one may only calculate a 
finite number of the PAR ancillas, and if all such rotations fail, then a deterministic 
rotation using phase kickback or a gate approximation sequence is applied. The 
probability of having to resort to this backstop is suppressed exponentially with the 
number of PAR ancillas pre-computed. 

The average number of rounds of the circuit in figure[4]before a successful rotation 
is simply given by J2m=i — 2. The X gate in each round can be performed with 



a Pauli frame 491-51 , so counting measurement as a gate, the number of gates per 



round is 2, and the average number of gates per PAR is 4. With a finite number 
of pre-computed ancillas M, there is a probability 2~ M of having to implement the 
considerably more expensive (in circuit depth) deterministic rotation. Nevertheless, 
if the computer supports the ability to calculate the programmed ancillas in advance, 
the PAR produces phase rotations that are orders of magnitude faster than other 
available methods, which also leads to faster simulation algorithms. 
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Symbol 


Name 


Matrix Representation 


X, Y, Z 


Pauli gates 




" 1 " 
1 




-i 1 [ 1 

1 J ' L 1 J 


H 


Hadamard 


i [ 1 1 1 

V2 [ 1 -1 J 


g 


7T / 4 nVlSQP (TJ}tP 




" 1 " 

i 




T 


7r/8 phase gate 




" 1 

e™/ 4 




CNOT 


Controlled-NOT 
(two-qubit gate) 




" 1 " 
10 
1 
10 





Table 1. Universal set of fault-tolerant gates in this investigation. 



2.4- Analysis of a single- qubit phase rotation 

We begin our quantitative analysis by examining fault-tolerant single-qubit phase 
rotations. We construct rotations using phase kickback, the Solovay-Kitaev algorithm, 
Fowler's algorithm, and PARs. In each case, we determine the depth of the quantum 
circuit and the types of fault-tolerant gates required. The techniques developed here 
will be used in the more complicated phase rotations for the simulation algorithms in 
Sections [3] and |4] 

To assess the performance of quantum circuits, let us assume the following 
simplified quantum computing model. The hypothetical system uses fault-tolerant 
quantum error correction, so we presume the quantum gates are ideal. The 
quantum computer only has access to a limited set of "fundamental" gates, which 
are summarized in Table [T] this set of gates is typical for a fault-tolerant quantum 
computer |26j|48j|5l][52] . We allow full parallelism so that gates can be applied to all 
qubits simultaneously, as long as the two-qubit (CNOT) gates do not overlap. Because 
the fundamental gate set has a finite number of members, phase kickback or gate 
approximation sequences are required to produce approximations to arbitrary gates. 
We should note that each logical gate with error correction will require many more 



physical operations to implement 25 48 51 , but we purposefully avoid these details 



so that our present analysis is independent of hardware and error correction models. 

When benchmarking the performance of a phase rotation, the important figures 
are the quantum resources consumed to achieve a given accuracy of approximation. 
Using the distance measure in Eqn. ([5]), the approximation error is quantified as 

e = dist 2 (Rz U appmx ) , (6) 

where U a p pTax is the circuit approximating Rz{<fi)- Figure [5] reports two quantum 
resources for a single-qubit rotation: circuit depth, which is the minimum execution 
time in gates; and the total number of T gates required (see Table [l]) . T gates are 
significantly more expensive to prepare fault-tolerantly than other fundamental gates 



in many prominent error-correcting codes 26 52 , so they represent an important 



consideration for large-scale quantum computing 21 48 51 . It is apparent from 
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Figure 5. Color. Quantum computing resources required to produce a fault- 
tolerant singlc-qubit phase rotation to accuracy e = dist2 (i?z(</>), £/ a pprox) using 
various methods, (top) Circuit depth for single-qubit rotations, (bottom) 
Number of T gates required for each rotation. There is variation in the resources 
required for Solovay-Kitaev sequences, Fowler sequences, and PARs; each point 
is the mean number of gates required, and where applicable, the bars show 
plus/minus one standard deviation. The Solovay-Kitaev data is averaged over 
9534 random angles (<j>), and the Fowler data is averaged over 98 random angles 
per point. Fowler sequences are numerically intensive to calculate, so curves fit 
to the data are shown for e < 10~ 3 : depth = — 24.91og 10 e — 7.64 and T gates 
= — 9.751og 10 e — 2.81. Phase kickback is implemented here with a ripple-carry 
adder |44|. PARs use six pre-computed ancillas. Solovay-Kitaev sequences were 
calculated using code written by Dawson [28] ; Fowler sequences were calculated 
using code written by Fowler. 
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Method 


Description 


Advantages 


Disadvantages 


Phase 
kickback 


Approximates 
arbitrary phase 
rotation via 
controlled addition 
applied to 7^) 
ancilla register. 


Trivial to compile. 
Circuit depth is 
O(loge) or 
(9(log loge), 
depending on 
adder circuit. 


Requires a logical 
ancilla register 
consisting of 
O(loge) qubits. 
Resource costs are 
about 2-3 x higher 
than Fowler 
sequences. 


Solovay-Kitaev 
sequence 


Approximates 
arbitrary rotation 
with a sequence ol 
fundamental gates. 
Depth is <3(log c e), 
with c « 4. 


Polynomial-time 
compiling 
algorithm. No 
logical ancilla 
states. 


Dramatically more 
expensive in 
quantum resources 
than alternatives. 


Fowler 
sequence 


Approximates 
arbitrary rotation 
with a sequence of 
fundamental gates. 
Depth is O(loge). 


Minimal-depth 
sequences. No 
logical ancilla 
states. 


Sequence- 
determination 
algorithm has 
exponential 
complexity and 
becomes infeasible 
for high-accuracy 
rotations. 


Programmable 
ancilla 
rotation 
(PAR) 


Approximates 
arbitrary rotation 
with a probabilistic 
circuit using ancilla 
and measurement. 


Constant average 
depth (4 gates) for 
any phase rotation. 


Requires logical 
ancillas which must 
be pre-computed. 



Table 2. Summary of methods for producing fault-tolerant phase rotations. The 
quantity e is the accuracy of an approximate rotation, and it is defined by Eqns. (J5J 
and (RjJ. 



figure [3] that Solovay-Kitaev sequences are substantially more expensive than their 
counterparts in both circuit depth and T gates. Fowler sequences are very compact 
and, in fact, optimal for an approximation sequence, but the classical algorithm to 
calculate them requires a calculation time that appears to grow exponentially faster 
than the other methods: e < 10~ 2 requires minutes, e < 10~ 3 requires about an 
hour, and e < 10~ 4 requires about a day, for each rotation, on a modern workstation. 
For these reasons, phase kickback may be the method of choice when high-precision 
(e < 10~ 6 ) rotations are required. Phase kickback requires resources comparable to 
Fowler sequences, but the quantum circuit depends on adders, which are trivial to 
compile. The methods we analyze for producing fault-tolerant phase rotations are 
summarized in Table [2) 

3. Simulating chemistry in second-quantized representation 

Simulation in the second-quantized form expresses the electronic Hamiltonian T-L in 
terms of the creation operators a p ^ and the wavefunction in terms of fermionic (or 
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Figure 6. Excitation operator e ifl i2(.a.x^ a 2 +ai^ ai)St encoc ied into a quantum 
circuit 0. Above, = h 12 St. The gate R x (-n/2) = H • 3+ • H is available 
from the set in Table [I] In this example, the control qubit \t) is used for phase 
estimation, and the qubits and |x2 ) are basis functions (e.g. molecular 

orbitals). The controlled phase rotations CRz{8) must be approximated using 
circuits of available fault- tolerant gates. 



bosonic) modes \p) = a p ^ |0) (i.e., occupation number representation). In chemistry, 
the single-electron molecular orbital picture has provided a practical method for 
approximating an A-electron wavefunction. Using second-quantized algorithms, basis 
sets in computational chemistry can be imported directly into quantum computational 
algorithms. For this reason, both theoretical [4j[5j[7] and experimental [22| [35] 
investigations in second-quantization have been performed. 

Following the standard construction (see e.g. Ref. [12]), an arbitrary molecular 
Hamiltonian in second-quantized form can be expressed as 



p,q p,q,r,s 



(7) 



where h pq = (p\(T +Vpf)\q) are one-electron integrals (T is the kinetic energy operator, 
and Vjy is the nuclear potential) and h pqrs = (pq\V e \rs) represent the Coulomb 
potential interactions between electrons. All of the terms /i pq 's and h pqrs 's are pre- 
computed numerically with classical computers, and the values are then used in the 
quantum computer to simulate the Hamiltonian evolution through the operators 



U,_ h 



-ih P q(a p * a q +a q ' a p )8t 



and 



Upqrs € 



-ih pqrB {a p ^ a q ^ a r a s -\-a s ^ a r ^ a q a p )St 



(8) 



(9) 



These operators are constructed with a Jordan- Wigner transform and an arbitrary 
controlled phase gate CRz(4>) [7], as shown in figure[6j The Jordan- Wigner transform 
requires H, S, and CNDT gates, which are often readily available in fault-tolerant 
settings, so we focus first on the considerably more resource-intensive controlled phase 
rotations. We later show how to implement the Jordan- Wigner transform efficiently. 



3.1. Controlled phase rotations 

As can be seen in figure [6j when U pq or U pqrs is implemented in a controlled operation 
(such as in energy eigenvalue estimation, see also figure [I]) , the core component of the 
circuit is a controlled phase rotation, 



CR Z {4>) 







P i<t> 



(10) 
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Figure 7. Decomposition of a controlled phase rotation into CNOTs and fault- 
tolerant single-qubit rotations. If the control qubit only controls other circuits, 
as in phase estimation algorithms, the third phase rotation commutes with the 
CNOTs. In such an event, the third single-qubit rotations from all decompositions 
of controlled rotations commute, and they can be combined into just one rotation 
prior to a non-commuting operation on this qubit (such as the quantum Fourier 
transform and measurement readout in figure [TJ. As a result, controlled rotations 
in phase estimation algorithms are effectively decomposed into two CNOTs and two 
single-qubit rotations with this circuit. 



\y) - ^B z {4>) 
|0> 



\y) 



|0) B Z (0) 



Figure 8. Controlled rotation CRz{<t>) ( see Eqn. \10\ ) between qubits \x) and 
\y) using two Toffoli gates, just one single-qubit rotation gate, and an ancilla |0). 
The ancilla qubit is conditionally set to |1) using a Toffoli gate, and a phase is 
imparted to this state with the rotation Rz (</>). A final Toffoli gate returns the 
ancilla qubit to state |0). 



One way to implement the controlled rotation in Eqn. ( 10 1 is to deconstruct the 
operation into CNOTs and single-qubit rotations 53 , as shown in figure [7] Another 
method requires just one single-qubit rotation, as well as an ancilla |0), as shown 
in figure [8] Ref. 26 provides a circuit decomposition for the Toffoli gate into 
gates in Table [lj We use the circuit in figure [8] (requiring just one phase rotation) 
for the remainder of this paper, because the cost of one ancilla qubit is typically 
modest compared to a phase rotation. One can implement phase kickback, gate 
approximation sequences, or PARs to produce the single-qubit rotations, as in 
Section [2~4} Additionally, the PAR construction can be modified to produce controlled 
rotations more directly. If the control qubit only controls other circuits between ancilla 
production and the time a controlled-PAR is needed, as is the case for phase estimation 
algorithms, one can create the ancillas (see figure [4]) using controlled rotations with 
one of the above methods and produce a controlled-PAR with the same cascading 
circuit. 

The different methods of producing a controlled phase rotation are analyzed in 
figure [9j We have excluded Solovay-Kitaev sequences, which permits a linearly-scaled 
vertical axis, showing that each of these methods has execution time linear in loge 
or constant. As before, the values for Fowler sequences are extrapolated. We can see 
that Fowler sequences and phase kickback are separated by approximately a factor of 
3 in execution time, and the choice between the two would be motivated by whether 
compiling the Fowler sequence is feasible or not. The PAR circuit requires one of the 
above methods to pre-compute ancillas. 
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Figure 9. Color. Circuit depth for controlled phase rotations using various 
methods. A desired controlled rotation CRz{4>) i s approximated with a fault- 



tolerant circuit C/ a 



with accuracy 



dist4 (c7 approx , CRz (</>)) using the 



method in figure [8] Solovay-Kitaev sequences are omitted here to permit 
comparison of the more efficient schemes on a linear scale. The bars on 
Fowler sequence data indicate the standard deviation taken over 98 random-angle 
rotations. The controlled-PARs have depth of 4 gates, on average, regardless of 
rotation accuracy. Phase kickback uses a ripple-carry adder since the addends 
have less than 16 bits [44) . If very high precision were desired, a carry- lookahead 
adder can achieve depth O(logloge) at the expense of additional qubits and 
parallel circuits (more T gates) [45] ■ 



3.2. Finite precision in pre- calculated integrals 



The execution time of a second-quantized simulation algorithm is proportional to the 
number of integral terms h pq and h pqrs , as indicated by Eqns. ([7jj9|. We now consider 
how to speed up the algorithm by omitting the integral terms that are negligibly 
small in magnitude. For a basis set consisting of M single-particle orbitals, the 
maximum number of integral terms is 0(Al i ). In practice, however, the effort for 
evaluating these integrals often scales somewhere between 0(M 2 ) and 0(M 3 ) with 



modern implementations 54 , because typically many integral terms may be neglected 
for being smaller in magnitude than a cutoff threshold. Consequently, the execution 
time of second-quantized simulation is determined by the number of prc-computed 
integrals of the form h pq and h pqrs of sufficiently large magnitude, as well as the 
efficiency of producing the corresponding arbitrary phase rotations in the quantum 



Simulating chemistry efficiently on fault-tolerant quantum computers 14 



12000 




10" 7 10" 6 10" 5 10"" 10 3 10" 2 10"' 10° 10 1 

Cutoff value (atomic units) 

Figure 10. Color. The number of integral terms implemented in a second- 
quantized simulation of LiH using a TZVP basis, as a function of cutoff threshold. 
Only integral terms with absolute value above the threshold are implemented in 
circuits, and the rest are neglected. As shown in the figure, a cutoff of 10 -4 would 
require the algorithm to implement just over 9000 integral terms. 



computer, such as CRz(h pq 5t) in the gate sequence for e - th p<i( a t a i+ a1 q a p) St [^j. 

To illustrate how many integral terms are present in a typical chemical problem, 
we have calculated the integrals for a second-quantized simulation of LiH. We 
performed calculations in the minimal basis and in a triple-zeta basis, using the 



GAMESS quantum chemistry package 55 56 , at a bond distance of 1.63 A, with 



an integral term cutoff of 10 in atomic units. We computed the number of 
integrals above cutoff using the STO-3G basis 57 containing 12 spin-orbitals (6 spatial 
orbitals) and the TZVP basis [58] containing 40 spin orbitals (20 spatial orbitals). The 
cumulative number of integral terms as a function of cutoff in TZVP basis is plotted 
in figure 10 With the STO-3G basis, there were 231 non-zero molecular integrals, but 
only 99 of them were greater than 10 -10 atomic units in magnitude. This is an order of 
magnitude below what is expected from 0(M 4 ) scaling. Considering the larger, more 
accurate basis set (TZVP), there were 22155 non-zero integrals, but only 10315 were 
greater than the cutoff. Figure 10 shows that a higher cutoff, such as 10 -4 , can further 
reduce the number of integrals in TZVP basis implemented in the simulation. As a 
result, the effective number of integral terms the quantum computer must implement 
as phase rotations is nearly two orders of magnitude less than the asymptotic analysis 
would suggest, an example of the over-estimation of the resource costs that can occur 
when using asymptotic estimates. This technique becomes particularly relevant in 
large molecules since distant particles interact weakly, and in such an event, many of 
the associated integral terms may be negligibly small. Raising the cutoff threshold 
impacts the accuracy of the simulation, so one must attempt to balance the resource 
costs of simulation with the usefulness of the result. 



3.3. Jordan- Wigner transform using teleportation 

The second-quantized algorithm uses Jordan- Wigner transforms to implement 
operators such as e~* hpq< ' ap1 a i+ a q i a v ) St j anc j this section shows how to perform such 
transforms in constant time. As elaborated in Ref. u\, the circuits for Jordan- Wigner 
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Figure 11. Rearrangement of the CNOT ladder common in Jordan-Wigner 
transforms using teleportation. (a) The original CNOT ladder requires an execution 
time that grows with the extent of the simulation in qubits. (b) A conceptual 
diagram of what teleportation accomplishes. The qubits "move" backwards in 
time, (c) A valid quantum circuit that uses teleportation to move qubits in a 
manner which allows parallel computation of the CNOTs. The BSM is the Bell 
state measurement which teleports the qubits; the result of this measurement 
indicates the Pauli errors which are tracked by the Pauli frame [51] . The Bell 
state |"I >+ } = ^ ( 1 00) + 1 1 1} ) can be prepared from |0) ancillas using one H gate 
and one CNOT gate. Similarly, the BSM can be implemented using one H, one CNOT, 
and measurement of the two qubits in the computational basis. 



transforms often consist of ladders of CNOT gates, such as the one in figure [TTfc ,. In 
a simulation with M basis states, these ladders can extend across the entire register 
of qubits corresponding to these basis states, which leads to the 0(M 5 ) asymptotic 
runtime quoted in Ref. 12 when there are at most 0(M 4 ) integral terms. 



The CNOT ladder is a sparse network of Clifford gates, so we show how it may 
be implemented in constant time using teleportation |46| 47 . Figure |ll| b gives an 
intuitive picture for what will be accomplished. If the path of the qubits could 
be rearranged to somehow propagate backwards in time, the CNOT gates could be 
implemented simultaneously. Qubits cannot move backwards in time per se, but 
they can be moved arbitrarily using teleportation; notice how the conceptual (but 
unphysical) circuit in figure |TTp is realized by a physical circuit in figure 11 Ancilla 
Bell states |$ + ) = ^ ( 1 00) + |11)) are used to teleport qubits in this rearranged 
CNOT ladder. Teleportation introduces a random Pauli error on the telcported qubit, 
but it is possible to track these errors and their propagation through CNOT gates 
using Pauli frames 49-51 . With this modification, it is possible to implement the 



Jordan-Wigner transform in constant time, which removes one of the bottlenecks to 
high-speed second-quantized simulation. This method could be adapted to implement 
other Clifford-group circuits in constant time, at the expense of requiring enough 
ancilla Bell states. 



3.4- Resource analysis for ground-state energy simulation of LiH 

Using the hypothetical quantum computer from Section [2~4j we examine the resources 
required to perform simulation in second-quantized form. Estimates of the number 
of qubits required for various instances of second-quantized chemical simulation have 
been reported previously f4l|T2] , so we focus instead on the execution time and effort 
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Method 


Description 


Advantages 


Disadvantages 


Finite- 
precision 
cutoff in 
second- 
quantized 
integrals 


Neglect to 
implement integral 
terms below a 
chosen cutoff in the 
algorithm 
execution. 


Second-quantized 
circuit complexity 
is reduced in both 
depth and number 
of T gates. 


None if cutoff 
threshold is below 
gate approximation 
accuracy. 


Jordan- 
Wigner 
transform 
using 

teleportation 


Use a teleportation 
circuit to 
implement 
Jordan- Wigner 
transform in 
constant time. 


Second-quantized 
circuit depth 
reduces to at most 
0(M 4 ) from 
0(M 5 ). 


Teleportation 
circuit requires at 
most 3M - 4 
qubits instead of 
M (only during 
Jordan- Wigner 
transform) . 



Table 3. Summary of methods for efficient second-quantized chemical simulation. 
The quantity M is the number of basis functions used in the representation of the 
chemical problem; larger basis sets produce more accurate results at the expense 
of greater circuit complexity. 



to prepare fault-tolerant gates (here we consider number of T gates). Figure 12 shows 
both the circuit depth and number of T gates required to simulate LiH in the STO-3G 
basis as a function of rotation accuracy threshold e ma x) for 1023 simulated time steps. 
The precision in the readout is proportional the number of time steps simulated. The 
energy estimate in this simulation has 10 bits of precision, and in general, 2™ — 1 steps 
are required for n bits of precision. If we assume that the duration of a single quantum 
gate is 1 ms (cf. Ref. [51 ), then the total execution time of the simulation ranges 
from ~ 5.6 hours using PARs to ~ 3.8 years using Solovay-Kitaev rotations. 

The number of T gates in figure [12] serves as an indication of the complexity 
demanded of the quantum computer. Although we do not delve into this matter, 



Refs. 48 51 discuss the importance (and difficulty) of producing these gates. What 



becomes apparent is that using PARs, while very fast, is also more expensive in 
the consumption of T gates than directly implementing Fowler sequences or phase 
kickback. Choosing between such approaches depends on the capabilities of the 
quantum computer, and we discuss this matter in more detail in Section [5j 

To provide an indication of how much execution time in second-quantized 
simulation is devoted to phase rotations, figure [l3| shows the relative ratio of circuit 
depth devoted to implementing rotations versus all other gates for each of the methods 
considered when simulating LiH with rotation accuracy e < 10~ 4 . It is clear here that 
Solovay-Kitaev has such high circuit depth that it cannot be drawn to scale. We 
see also that Fowler and phase kickback sequences require execution times that are 
comparable, whereas PARs actually do not represent the majority of the circuit depth, 
unlike all of the prior methods. This is an encouraging result, because it shows that 
previous examinations that depended on Solovay-Kitaev sequences can be improved 



by orders of magnitude with more efficient phase rotations 21 . We do not consider 
Solovay-Kitaev sequences further in this investigation. The techniques for improving 
second-quantized simulation are summarized in Table [3] 
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Figure 12. Color. Total circuit depth and T gates for a second-quantized 
simulation of LiH using the STO-3G basis, calculated for different constructions 
of controlled rotations as a function of For a given Gmax , every 

controlled rotation CRz{4>) in the algorithm is approximated with a fault-tolerant 
circuit t/approx with accuracy distance e = dist4 (t/ a pprox, CRz{4>)) such that 
£max- An accuracy threshold e m ax ^10 ^ is used in later analysis. This 
simulation implements all integral terms in the Hamiltonian (see Eqn. |7j). (top) 
Circuit depth using the gate set in TableJT] In this plot, only the mean number of 
gates for PAR circuits is shown, (bottom) T gates required for each method. The 
controllcd-PAR ancillas are produced using controlled rotations constructed using 
Fowler sequences; 6 controlled-PAR ancillas are pre-computed for each rotation, 
and only mean values are plotted. The sudden jump in Solovay-Kitaev resource 
costs is because many controlled rotations in this algorithm have a small angle 
(j> that is approximated with identity gate at low precision, whereas the other 
methods are using a typical sequence length for arbitrary (f>. 
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Clifford Gates Approximate Phase Rotations 
Solovay-Kitaev: 
Phase Kickback: Q 
Fowler Sequences: 

PAR (in-place): Q 



7460 
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Figure 13. Color. The relative amount of time (circuit depth) of a fault-tolerant, 
second-quantized simulation of LiH devoted to Clifford gates {X,Y,Z,H,S,CNOT} 
versus phase rotations that must be approximated. In this example, rotations 
are computed to an accuracy e < 10 -4 . The relative circuit depth of rotations 
calculated by the Solovay-Kitaev algorithm is too large to be drawn to scale here. 
In the case of PAR, the ancillas must be pre-computed with a method such as 
Fowler sequences, but this can be carried out in parallel with other algorithm 
operations. 



4. Simulating chemical structure and dynamics in first-quantized 
representation 

The first-quantized simulation algorithm is in some ways more complex than the 
second-quantized algorithm, but for problems in chemistry larger than a handful of 
particles, it is computationally faster. A first-quantized simulation is essentially a 
finite-difference method for solving the Schrodinger equation. Configuration space is 
discretized into a Cartesian grid, and each particle (e.g. electron) has a wavefunction 
expressed in a quantum register that which encodes a probability amplitude at each 
coordinate on the grid. For example, let us imagine that we form a position-basis 
representation for a single electron on a 2 P x 2 P x TP grid, which requires only 3p 
qubits. Explicitly, the electronic wavefunction is represented as 

2 P -1 

|Ve)= £ c(x,y,z)\x)\y)\z)=J2<r)\r), (H) 

x^y^z—O v 

where c(x, y, z) is the complex probability amplitude for the electron to occupy the 



volume element centered at the position r = (x, y, z). The rightmost part of Eqn. (Ill 
is shorthand that will be used throughout this section. The spin degree of freedom can 
easily be incorporated by including an extra qubit, and to describe a many-electron 



state, the wavefunction has to be properly anti-symmetrized 30 59 . 

To simulate the evolution of a time-independent molecular Hamiltonian % for 
problems in quantum chemistry, we adopt the method given in Refs. [3}[13]. The 



complete Hamiltonian in first-quantized form can be expressed as the sum of the 
kinetic (T) and potential (V) operators 

where the indices i and j run over all particles (electrons and nuclei) of any given 
molecule. Here = |r< — Tj\ is the distance between particles i and j, which carry 
charges and qj respectively. 

Let us outline how first-quantized simulation works before delving into details. 
The core of the algorithm is evolving the Hamiltonian in simulated time, achieved by 
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applying the propagator U(t) = cxp(-i'Ht) (setting h — 1 and assuming H is time- 
independent), which solves the time-dependent Schrodinger equation [2j. This process 
is readily achieved using the split operator approximation, a form of Trotter-Suzuki 
decomposition 1 12 23 60 61 , where the kinetic and potential energy operators are 



simulated in alternating steps as 



U(t) 



,-i'Ht 



e -iT6t/2 e -iV8t e -iT6t/2 



(13) 



The operators e~ lVSt and e~ lTSt are diagonal in the position and momentum bases, 
respectively. One can switch the encoded configuration space representation between 
these two bases by applying the quantum Fourier transform to each spatial dimension 



computer 



40 



. Ref. 



13 



of the wav ef un ction (cf. Eqn. (Ill), which can be efficiently implemented in a quantum 



shows how to construct quantum circuits for operators e lVSt 
and e~ lT5t , and in this section, we complement that work with analysis of fault-tolerant 
versions of these operators. 

To make an algorithm fault-tolerant, its constituent operations must be 
decomposed into circuits of fault-tolerant primitive gates such as those in Table [I] 
Consider the potential energy propagator e~ lVSt as an example. Given a 6-particle 
wavefunction in the position basis as 



IV>i 



E c ( r 



i,r 2 , ...,r b ) |nr 2 ...r 6 ) 



(14) 



where c(-) is the complex amplitude as a function of position in configuration space and 
subscripts correspond to particles in the system, one calculates the phase evolution of 
the potential operator e ~ lVSt in three steps, as follows: 



E 



c(r 1 ,...,r b )|r 1 ...r b )|000...) 



E 

ri, ...,r(, 

E « 

n, ...,!■(, 

E < 

ri,...,r b 



cm, 



,r b ) |ri. ..!■(,) \V(rx, ...,r b )) 



-iV(r u ...,T b )St 



c(ri,...,r 6 ) |ri...r b ) |V(ri, r b )) 



-iV(r u ...,r b )St 



c(r 1 ,...,r b )|r 1 ...r b )|000. 



(15) 



(16) 



(17) 



13 



First, Eqn. (|15) calculates the potential energy as a function of position 
(note that V is diagonal in this basis) and stores the result in a 



coordinates 

quantum register | V(ri, r 2 , r b )) to some finite precision. Appendix B describes how 
to implement this quantum circuit for molecular Hamiltonians. Second, Eqn. ( 16 ) uses 
the |V(ri, r 2 , r b )) register in a "quantum variable" phase rotation that imparts a 
phase to each grid point of the wavefunction in position basis proportional to the 
potential energy at those coordinates. This section discusses how to implement the 
quantum variable rotation using fault-tolerant quantum circuits. Finally, the quantum 
circuit from the first step is reversed in Eqn. (171 to reset the |y(r 1; r 2 , r b )) register 
to 1 000...), also known as "uncomputation" 26 . The sequence of these three steps is 
equivalent to the operation e~ lVSt 

The kinetic energy propagator e~ lTSt is calculated similarly in three steps, with 
the second also being a quantum variable rotation. This operator is diagonal in 
momentum basis, so we transform the representation of the system wavefunction from 
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Figure 14. Quantum variable rotation decomposed into single-qubit rotations 
applied to each qubit in the \8) register consisting of q qubits (see Eqn. | |18[ l). 
\6q— i) refers to the most significant bit in the register \8), etc. 



position basis {x, y, z} to momentum basis {k Xl k y , k z } by applying a QFT along each 



spatial dimension of the encoding in Eqn. ( 11 ). This form permits efficient calculation 



of the kinetic energy operator 13 , which is described in Appendix B 



4-1- Quantum variable rotation 

The phase rotation subroutine in the first-quantized simulation algorithm imparts a 
quantum phase to each binary-encoded phase state in a superposition \8) = yV c 
stored in a quantum register (c,-'s are arbitary complex amplitudes) 
the transformation 



Formally, it is 



j 



\<t>j 



E 



(18) 



which generalizes the operation in Eqn. 



varies with implementation, as explained below and in Appendix B Each < 



(16) using £, which is a scaling factor that 

< 1 



is a finite binary representation of a rotation on the unit circle encoded in a quantum 



register. Eqn. (18) is the quantum variable rotation (QVR), which is essential to first- 
quantized simulation. We show how to implement this phase rotation subroutine using 
phase rotations from previous sections, as well as a new construction based on phase 
kickback. At the end of the section, we analyze the resource costs of these methods. 

To produce a QVR, various circuit manipulations are possible. The first is to 
simply apply a single-qubit rotation to each qubit in register \6), as shown in figure 14 
Each individual rotation could be created using the techniques in Section [2j Since a 
i-bit QVR requires t separate bitwise rotations, we require that each rotation has 
accuracy e/t to achieve accuracy e in the QVR, where we have used the fact that 
the distance measure in Eqn. ([5| obeys the triangle inequality 29 . If the QVR is 
controlled by another qubit {e.g. if the propagator is controlled by a "simulated time" 
qubit as in figure [I]), then the gates in figure 14 are replaced with controlled rotations 
from Section [XT] In either case, one must know the quantity £ in advance to compile 
these gates; typically, £ is a product of physical constants and simulation parameters, 
as explained in |Appendix B| 

The QVR can also be produced in a more elegant manner using phase kickback. 
Rather than apply bitwise gates to the \9) register, we instead use the entire register 
in a modified version of the phase kickback procedure. First, we require a binary 
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Figure 15. Quantum variable rotation using phase kickback. This circuit 
implements the operation in Eqn. | |18| with scaling factor [£], which has been 

y( fe [«l)) (s 



Appendix A |. A 



"programmed" into the phase kickback register 17' 

control qubit |t) is included for illustration. This figure shows how the bits in the 
adder are aligned for different cases, (a) The register \8) is shifted down p bits 
since p > 0. 80 is the least-significant bit in the \9) register, etc. The input qubits 
above \8) are logical zeros, (b) The register \8) is shifted up \p\ bits since p < 0. 
In this case, the \p\ most-significant bits of \8) are not used in the adder. 



approximation to £, denoted [£]. Second, we define some quantities that describe 
this quantum circuit. Let m denote the number of significant bits in [£], minus the 
number of trailing zeros. Define w = [log 2 [.f ] J , or in other words, w is the largest 
integer such that 2 W < [£]. Denote p = (m — 1) — w, which is how many bits we 
must shift [£] up to produce an odd integer (if p < 0, we shift down). Following 
Eqn. (18), let q be the number of qubits in \6). Define integers fcrgi = (2 P )[£] and 
= {2 q )4> for some arbitrary <f> G [0, 1) represented using q bits. Third, we construct 
a phase kickback ancilla register | ^C fe [ei ) ^> of size n = p + q qubits, using techniques 
in |Appcndix A| Finally, we perform phase kickback with an addition circuit between 
registers \&) and |7( fc t«i^ (in-place addition applied to |7^ fc[41 -')), except this time the 
\6) register is shifted in one of two ways, as shown in figure 15 If p > 0, then the 
1 0) register is shifted down by p qubits, and the \0) register is padded with p logical 
zeros at the most-significant side of the adder input (figure 15 1). If p < 0, then \0) 
is shifted up by \p\ qubits, so that the \p\ most-significant bits of \9) are not used in 
the adder (figure [15]d) . If n < 0, then all rotations are identity and no QVR circuit is 
constructed. 

We now confirm that this procedure produces the intended quantum variable 
rotation. Using Eqn. ([3]), we see that the above procedure will implement a phase 
rotation of 



^Z c i \<t>j) 



\4>j 



(19) 
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■ ▼ ■ PAR (Single-Qubit Bitwise) 

■ ♦ Phase Kickback (Single-Qubit Bitwise) 
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Figure 16. Color. Number of T gates required to produce a QVR with 
various methods, assuming § = 1 and number of significant figures is chosen 
to satisfy the approximation error e. The special-purpose "quantum variable" 
phase kickback clearly requires the least circuit effort, and the asymptotic scaling 
of T gates is linear in loge for this approach and super-quadratic for the others. 
The circuit depth for Fowler or phase kickback approaches is equivalent to the 
comparable single-qubit rotation; however, the PAR must succeed across all 
individual rotations for this circuit to succeed, so the mean circuit depth increases 
slightly. In the above, 10 rounds of PAR ancilla are pre-computed for each single- 
qubit rotation in the QVR. 



Since fcrji = (2 P )[£] and — (2 9 )0, this is the same as 



^c,|0,)^^e 2 ^c,|^), (20) 

3 



which is equivalent to Eqn. ( 18 ) using our finite representation for £. As before, if we 
require a controlled-QVR, then the adder can be controlled by an external qubit, which 
is the configuration shown in figure 15 This "quantum variable" phase kickback uses 
substantially fewer T gates than the bitwise approach, as shown in figure [16] while 
having comparable circuit depth. Moreover, since there is only one phase rotation 
instead of many, it does not have to be as accurate as the individual rotations in 
figure [14] must be to achieve the same total accuracy in the QVR. 

It may seem inefficient to produce a different phase kickback register for each QVR 
operation, but three properties of the first-quantized simulation algorithm make this 
approach efficient. First, there are only a polynomial number of such operations: for b 
particles, there are b QVRs in the kinetic energy operator and hb{b — 1) QVRs in the 
potential operator. Second, many of these QVRs have the same scaling factor £, so a 
phase kickback register can be reused many times without modification. For example, 
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the scaling factor in the kinetic energy operator is the same for all electrons (which 
have the same mass). Third, the |7( fc [s])^) registers can be calculated independently 
of other operations in the algorithm, so the impact of this process on circuit depth is 
minimal. 

This phase kickback QVR has interesting applications to other useful quantum 
circuits. It can be used to make a fault-tolerant quantum Fourier transform (QFT); 
one replaces each block of controlled rotations with a controlled-QVR. As before, 
this approach uses substantially fewer T gates than an equivalent circuit where 
each controlled rotation in the QFT is implemented individually with techniques in 



Section 3.1 and the same methods can be applied to an approximate QFT [62] by 
simply truncating the size of the \ register. The phase kickback QVR can also be 
used to efficiently produce ancillas for PAR if the particular rotation Rz{4>) is required 
frequently, which can have applications to second-quantized simulation. If we denote 
the state |+) = 4g (|0) + |1)), then an input state of |+) |+) |+) ... will be transformed 
using QVR (with appropriate £) into the set of ancillas for PAR, but requiring only 
one addition circuit for the entire set instead of a phase kickback addition or Fowler 



sequence for each ancilla qubit, which can be seen by comparing figure 14 with the 
ancilla preparation in figure 4 Creating the necessary for this process is costly, 

so there is a net gain only iia certain rotation angle <f> is required often. 



4-2. Improved parallelism in potential energy operator 

The majority of the circuit effort in first-quantized simulation is devoted to calculating 



the potential energy 13 . We introduce here a technique to substantially speed up 
the calculation of the potential energy operator V, which is simply the sum of the 
Coulomb interactions Vij — 4 ^ J r . between all pairwise combinations of the electrons 
and nuclei. Note that this operator is a function of the positions of the system 
particles only, so it is diagonal in the position basis |rir2...r(,). This fact means 
that all terms Vij commute with each other, so they may be calculated in any order. 
Moreover, there are many sets of the Vij operators that are disjoint, which means 
that each particle in the system is acted on by just one operator in the set. Using 
this observation, for example, we may calculate the Coulomb interaction V\2 between 
particles 1 and 2 at the same time as V34 between particles 3 and 4, and so on. In 
general, for a system of b particles, there are \b(b — 1) pairwise interactions, and we 
can perform [_§J pairs in parallel, which means that a potential energy operator with 
0(b 2 ) terms can be calculated in 0{b) time. This parallelism can increase the speed 
of simulation significantly since evaluation of the potential energy dominates resource 
costs [5l] . 

The potential operator calculation can be further parallelized to achieve 0(log&) 
or 0(1) (constant) circuit depth. Exploiting the fact that all Vij are diagonal in 
position basis (and hence commute), we use transversal CNDT gates to copy the data 
in position-basis particle wavefunction onto multiple empty quantum registers. For a 
single particle, this process is 

( c(x,y,z)\x)\y)\z) \ 000... 000 

\x,y,z=0 ) 
2 P -1 

-+ J2 c(x,y,z)(\x)\y)\z))(\x)\y)\z))(\x)\ y )\z))... (21) 

x,y,z=0 
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For 6 particles, the copy operation is performed 6—2 times (for 6—1 total copies), which 
can be fanned out using a binary tree with depth |~log 2 (6— 1)] ; constant depth can be 
achieved in some quantum computer architectures which support one-control/many- 
target CNOTs [5Tj[52] or in general architectures using a teleportation circuit similar to 



those described in Section 3.3 This approach is similar to that employed in Ref. 39 
to produce a parallel circuit for the QFT. The system wavefunction is now expanded 
to the state 

IVWandH E c(v 1 ,...,r b )(\r 1 })^ b - 1 K..(\r b ))^ b - 1 \ (22) 

ri,. ..,!•(, 

which requires 0(b 2 ) memory space. Note that this process is not cloning — the 
position-basis particle registers are still entangled to one another. With multiple 
accessible copies of each particle's position-basis information, the particles are matched 
in all 6(6 — 1) possible pairings, and the potential energy operator applied to each 
pairing in parallel, which can be accomplished in constant time, but still requires 
0(b 2 ) circuit effort. After each of the potential energy operators Vij kicks back a 
phase, the excess copies of each particle wavefunction are uncomputed by reversing 
the tree of CNOTs above. The preceding example demonstrates that it is possible to 
calculate V in time which is sub-linear in the number of particles, even if each Vij is 
treated as a black box operator. In practice, more efficient circuits can be produced 
by generating the internal "workspace" registers of V in parallel, rather than making 



copies of the input registers J2 Tl r b c ( r lj — > r f>) l r i--- r b) ( see Appendix B| 



4-3. Resource analysis for first- quantized molecular simulations 

The advantage of using the first-quantized approach is that the errors of the simulation 
are systematically improvable by increasing the spatial precision of the wavefunction 
and the temporal precision of the timestcps. However, calculating kinetic and potential 
energy interactions requires quantum arithmetic circuits and phase rotations, which 
together require substantial resources in terms of fault-tolerant gates and qubits. 



Figure 17 shows two versions of first-quantized simulation using the techniques for 
parallel calculation of potential energy from the previous section. Although constant- 
depth evaluation of the Hamiltonian is possible, it requires a significantly larger 
quantum computer to achieve the parallel calculations, so this implementation is 
probably best suited to large-scale quantum computers. 



Examining figure 17 note that the circuit depth at 6 particles (e.g. LiH) 
is comparable to that of the equivalent PAR-based second-quantized simulation in 
figure [l2] while requiring many more qubits, indicating that first-quantized simulation 
is more appropriate for larger molecules than LiH, since the circuit depth for first- 
quantized simulation is asymptotically less than second-quantized as particle number is 
increased [12] . Moreover, these calculations have assumed that the spatial precision is 
10 qubits for any molecules with 2 to 20 particles. As the size of the molecule increases, 
the number of qubits for each dimension of the encoded wavefunction will have to 
increase as the molecule itself is spatially larger. One may also choose to increase 
spatial resolution to achieve a higher-precision simulation. Each of the methods we 
propose for improving first-quantized simulation are summarized in Table [4] 
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Figure 17. Color. Circuit depth for two instances of first-quantized simulation. 
The in-place calculation of potential energy computes each pairwise Coulomb 
interaction in sets of non-overlapping particle pairs, and both the depth and 
number of qubits required increase linearly with number of particles. The 
fully-parallel calculation creates many copies of the wavefunction to permit the 
potential energy to be determined in constant time, at the expense of requiring 
substantially more application qubits (quadratic in number of particles). In both 
cases, the wavefunction precision along any spatial dimension is 10 qubits, and 
the simulation uses 1023 time steps for 10 bits of precision, or ~ 3 significant 
figures. 



5. Comparing simulation methods 

The prior sections illustrate that there exist numerous ways to simulate a molecular 
Hamiltonian, including choices between encoded representation in a quantum 
computer and the way fault-tolerant rotations are prepared. The final result one 
desires to know is, Which method is best? Determining an optimal approach is 
subjective to the quantum computing resources available, so in this section we describe 
how to make such a decision. 

To visually compare different implementations of a simulation algorithm, we plot 
the efficient frontier for each method in a plane defined by machine size (qubits) on 
the a;-axis and execution time (circuit depth) on the y-axis. The efficient frontier 
is the set of all points (size, depth) such that for each achievable machine size, the 



(achievable) depth is minimized, and vice versa. As an example, figure 18 shows the 
efficient frontiers of various implementations of a LiH simulation. 

To determine the optimal implementation, one specifies a cost function g(x,y), 
which associates with any point (x,y) a "cost" to implement simulation using these 
parameters. For example, cost could be the estimated engineering challenge to produce 
a quantum computer of size x qubits combined with a penalty for the execution time 
of y gates, which is a measure of performance. Minimizing the cost function along 
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Method 


Description 


Advantages 


Disadvantages 


LldllL UIll 

variable 
rotation 
(QVR) 


Use phase kickback 
to apply a 
fault-tolerant 

r \ no oo rAf otuin 4" f\ 
JJIiabc iOLdLlUll LU 

each element in a 
superposition, 
proportional to the 
binary-encoded 

Tftinir 1 nf tnat 

element. 


Reduces 
complexity of 
first-quantized 
simulation. Circuit 
depth is essentially 
the same as 
single-qubit phase 
kickback, but the 
QVR requires 
substantially fewer 
T gates than the 
method in 
figure 14 


Not the minimal 
depth achievable, 
such as with PARs. 


Parallel 
evaluation of 
potential 
energy terms 


Reduce potential 
operator circuit 
depth using 
parallel 
computation. 


Shorter circuit 
depth than 
calculating all 
\b(b — 1) terms 
individually. 


Concurrent 
computation 
requires more 
T gates 

simult aneously. 


Teleportation 
circuit 

expansion for 

potential 

operator 


Use a teleportation 
circuit to 
"control-copy" 
position-basis 
wavefunction in 
constant time. 


Potential operator 
can be evaluated in 
a time which is 
independent of 
problem size. 


Circuit size in 
qubits increases to 
0{b 2 ) from 0(6). 



Table 4. Summary of methods for efficient first-quantized chemical simulation. 
The quantity b is the number of particles in the chemical problem, which influences 
algorithm resource costs. 



each efficient frontier gives the optimal set of parameters for that particular method, 
and minimizing over all efficient frontiers gives the best implementation that is known 
to be achievable. 

For the various implementations for a LiH simulation in figure |18| it seems likely 
that one would choose between the compact algorithm with Fowler gate sequences or 
the faster version with PAR sequences, which requires additional qubits to compute 
the necessary ancillas. First-quantized can potentially deliver the fastest execution 
time here, but for this problem the number of qubits required is substantially greater. 
Still, first-quantized gains an appreciable performance advantage if the number of 



particles is increased or if one moves to simulating time-varying dynamics 12 . 

Naturally, future algorithm advancements could produce new frontiers that are 
more desirable for a given cost function. In general, one would like to make such 
comparisons, which can inform design decisions for quantum hardware, with full 
consideration of the cost to implement error correction, produce non-Clifford group 
gates (e.g. T gates), and so forth. However, such comprehensive system analysis is 



beyond the scope of this investigation; see Refs. 21,48 51 63 for further details 
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Figure 18. Color. The efficient frontiers for various implementations of 
simulating LiH ground state energy on a quantum computer. Each star data 
point corresponds to the equivalent method in figure |12| at rotation accuracy 
e maJ < 10~ 4 ; similarly, first-quantized simulations use QVRs with the same 
accuracy. The PAR frontier (purple) and first-quantized frontier (brown) have 
adjustable parameters that reduce circuit depth through parallel computation 
at the expense of increased system size (application qubits). For example, the 
PAR-based algorithm only achieves the circuit depth shown in figure |T2] when the 
system has 68 qubits, which is the yellow star here. 



6. Conclusions 

This paper examines the methods required to simulate chemistry on a fault-tolerant 
quantum computer. A crucial operation in these algorithms is the production of phase 
rotations, and several approaches — phase kickback, gate approximation sequences, 
programmable ancilla rotations (PARs), and quantum variable rotations (QVRs) — 
are analyzed. First, it should be clear that sequences generated by the Solovay- 
Kitaev algorithm are not nearly as efficient as the alternatives, phase kickback and 
Fowler sequences. Fowler sequences are the shortest for a fault-tolerant single-qubit 
rotation, but the classical computing effort required to determine such sequences 
becomes intractable for high-precision (e.g. e < 10~ 6 ) rotations. Phase kickback 
is a versatile technique that produces rotations comparable to Fowler's algorithm in 
resource usage, with the former having circuit depth O(loge) or O(logloge) gates and 
requiring O(loge) T gates. Furthermore, the underlying circuit for phase kickback is 
an adder, which can be determined using efficient classical algorithms (unlike Fowler's 
algorithm), and phase kickback can be extended more readily to QVRs. The PAR 
allows the quantum algorithm to achieve exceptionally low-circuit-depth rotations, 
at the expense of computing ancillas in advance (which is less efficient in terms of 
T gates). Finally, the QVR is particularly useful for first-quantized simulation. The 
relative merits of the methods for producing phase rotations are compared in Table [2] 
This investigation also examined two variants of the simulation algorithm, second- 
quantized and first-quantized, whose primary difference is the way wavefunctions 
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are encoded and operated upon. Generally speaking, second-quantized is a more 
compact representation, requiring fewer qubits, but it requires asymptotically longer 
execution times than first-quantized, measured in circuit depth, as the problem size 
increases in terms of independent particles to simulate. Our results provide a more 
nuanced way to compare these methods by explicitly considering the possible ways 
to make the algorithms compatible with fault-tolerant quantum computing and the 
resulting resource costs incurred. We have also introduced several improvements to 
the simulation algorithms. In the second-quantized approach, one can neglect some 
of the integral terms smaller in magnitude than a cutoff threshold, implement the 
Jordan- Wigncr transform in constant time, and use PARs to substantially reduce 
circuit depth, at the expense of requiring parallel production of the pre-computed PAR 
ancillas. In first-quantized, we demonstrated how to produce QVRs with arbitrary 
scaling factor, as well as how to parallelize the calculation of the potential energy to 
time linear in system size (without increase in qubits) or to constant time (requiring a 
number of qubits that grows quadratically instead of linearly with number of particles 
simulated). The methods we present for efficient chemical simulation on quantum 
computers are summarized in Tables [3] and [4] 

Although we have focused on simulating quantum chemistry, these methods 
can be extended to simulating other Hamiltonians on quantum computers, such as 



spin lattice models 64 , lattice gas automata 65 and lattice gauge theories 66 



or quantum chaos theories 67 . Moreover, the fault-tolerant rotations could find 
application in other quantum algorithms, including any which require a Fourier 
transform. This investigation provides a flexible set of methods for making simulation 
algorithms practically realizable on fault-tolerant quantum computers. 
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Appendix A. Transforming the phase kickback register 



In some situations it is useful to change the k- value in 7^), the phase kickback 
ancilla register (see Eqn. (|2|). Without control over k, the quantum variable rotation 
in Section 4.1 would require solving Eqn. Q in a quantum circuit; this step would in 
turn require a multiplication operation, which can be expensive in terms of quantum 
gates. We deviate here from Ref. 38 and propose a simple way to avoid the expensive 
operations associated with modular multiplication. The specific ancilla state |7^) 
does not require an additional circuit to solve Eqn. Q , so we create this state explicitly 
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using a simple transform 7^) — > 7^)- We begin by factoring the register 
into individual qubits (note that all such states are separable, i.e. not entangled): 

\i (k) ) = ^Y,^ lkv/N \y) 

V y=o 

= -L (JO) + e - 2mk / 2 ® (jO) + e - 2 " fc / 4 |1)) ® . . . 

® (|0) +e- 2 " fe / 2 " |1>) . (A.l) 

We convert this state into 7^) with a series of single-qubit phase rotations using 
the controlled addition circuit from figure [2j Since k is odd, the first bit of our ancilla 
register is always 771 (|0) — |1)). The next bit must be rotated by the phase gate 
Rz(ft(k — 1)), which is either identity or depending on k. In general, the corrective 
gate applied to the m th bit is the phase rotation Rz {2n J^-i ) ; which may be produced 
using the preceding m — 1 bits of the ancilla register and phase kickback. By iterating 
through all qubits in the register, we complete the transformation with circuit depth 
0(n 2 ) gates or less, depending on the type of adder used in phase kickback. This 
procedure can be generalized to any transformation 7'*)) — > \"/^) for odd integers 
1 < k, I < 2™, where n is the number of bits in the phase kickback register. 



Appendix B. Quantum circuits for potential and kinetic energy operators 
in first-quantized molecular Hamiltonians 

First-quantized molecular simulation represents the simulated system wavefunction on 
a Cartesian grid, and the Hamiltonian is calculated with digital arithmetic acting on 
this coordinate space. Similar methods were discussed in the supplementary material 
for Ref. [13] , but we update this analysis for the quantum variable rotation (QVR) 
introduced in this work. The potential energy operator is diagonal in position basis, 
and it is the sum of Coulomb interactions between electrons and nuclei in the system: 
y= ?Yh&%> wh ere 

* - St (s^r) <■»> 



and qj is the charge of particle j. The prefactor on the RHS of Eqn. (B.l ) is a constant 



for any given pair of particles, and we can later encode this scaling factor into the 
QVR. What remains is to calculate 1 — ^ — r over the position-encoded wavefunction. 

I r i r j I 

Each position register can be decomposed in Cartesian components |r) = \y) \z), 
so for a pair of particles we calculate 

\n 2 ) = {xi - Xjf + (yi - yjf + (zi - z-jfj. (B.2) 

The required multiplication operations can be implemented using quantum adder 



circuits. Next the quantity 
with the iterative equation 
1 



is calculated using the Newton-Raphson method 



„+i = -a n (3 - a n 2 r v 2 ) . (B.3) 



With suitably chosen initial value do, Eqn. (B.3 ) converges within 5 iterations at 32-bit 
arithmetic, and typically less precision is required for simulation. The register 1 
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is used in a QVR with scaling factor £ = from above, where St is the time-step 

of this simulated evolution and an additional factor 1/2tv comes from Eqn. (18 1. Note 

that each component of ^\ is entangled to a position-basis component of the system 

wavefunction, so the QVR effectively kicks back a phase to the wavefunction. Each 
of the steps prior to the QVR is uncomputed, and the net effect of this sequence 
of operations is to implement the potential energy propagator e~ lh Vij5t , as in 
Eqns. jl5}{l7}. 

The kinetic energy operator is calculated using a similar approach as the potential 
energy. The kinetic energy is the sum of individual kinetic energy operators on each 
particle: T = J2j Tj , where 

p_ _ ^ 

J 2m j 2rrij 

The quantity wij is the mass and kj = Pj/h is the non-relativistic wavevector 
corresponding to particle j. By performing a quantum Fourier transform along 
each spatial dimension of the wavefunction, the system representation is transformed 
from position basis to momentum basis: {x, y, z} — > {k x ,k y ,k z }. This form permits 
immediate calculation of magnitude squared of the wavevector: 

||k| 2 ) - \k x 2 + k y 2 + k z 2 ) . (B.5) 

The ||k| 2 ) register is used in a QVR with scaling factor £ = 4 ^ . Afterwards, the 
intermediate registers used in the calculation of ||k| 2 ) are uncomputed, and the end 
result is the operator e~ lh lrp i St . 
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